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In this paper, adapted from the author's PhD thesis, we present otherwise unpublished results 
relating to global control schemes, culminating in the calculation of a fault-tolerant threshold for 
one such scheme. As with early fault-tolerant threshold results, the aim is to calculate a positive 
number, not to optimise it. We also discuss how the results might affect other related schemes, such 
as those based on cellular automata. 



In some physics settings, such as optical lattices, while we can initialise states of a large number of qubits, the control 
J"""-. ' of individual qubits is particularly challenging. In other systems, while single-qubit addressing may be feasible, the 
scaling requirements of needing many control elements for each and every qubit, and for the interactions between 
qubits, make device structures very complex, and it would be desirable to reduce these. In these realisations, we 
should see if an architecture can be designed that is suited to the physical situation, ensuring that it is at least as 
■ powerful as the theoretical architecture for Quantum Computation, by demonstrating a universal set of quantum 
^ ; gates. 

. In order to avoid single-qubit addressing, we assume that we have a set of fields that we can control. These fields 

address the whole device in some way, and it is our task to see how to compose these to give universal computation, 

and to address the parallelism requirements for error correction and fault-tolerance. In the early sections, we present 

£SJ ■ a review of of the global control scheme we use including the required structures for error correction and fault- 

£> \ tolerance of the computational qubits [|J and the auxiliary (classical) states Q . We then detail the calculation of a 

s * • fault-tolerant threshold in this restricted scenario, comparing the trade-offs of different assumptions. This is adapted 

^ ' from [1. 
CN ' 

<N 
O 
l> 

o 
^3 

A relatively simple protocol that can be used to demonstrate some of the ideas of global control is that of state 
transfer. The scenario is that we start with a chain of qubits (Fig. [1]), and the qubit at one end has some quantum 
state, | -0), stored on it. We would like to transfer this state to the opposite end of the chain 1 . A typical way to do 
this would be to to perform SWAP gates one at a time so that the state is discretely moved from qubit 1 to qubit 
2, and then from qubit 2 to qubit 3 and so on. However, we can do exactly the same with global control. In this 
situation, we only allow pulses to be sent to the entire device. For example, we could allow a global pulse to turn on 
an interaction between alternate pairs of qubits. Specifically, one might think of turning on an interaction of the form 
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INTRODUCTION TO GLOBAL CONTROL 
A. State Transfer by Global Control 



d Hi — y (A^2nA2„+i + Y2„Y2n+l) 



which gives a series of SWAP operations. By alternating this with a second global field that activates the interaction 



H2 — ^](A2 n -lA2 n + Yin-\Ym) 
a 

you can quickly convince yourself that it is possible to achieve the desired state transfer. It is pair-wise interactions 
of this form that we will use to create a general quantum computation protocol with global fields. This state transfer 
protocol has two notable advantages over the permanently coupled spin chain known from studies of perfect state 
transfer 1,10. Firstly, we can perform the transfer whenever we want, without the complications of moving the 
state onto an ancillary device. Secondly, the transfer occurs independently of the states of the other qubits in the 
system; there are no controlled-phase gates applied during transfer. 
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We know how to do this with no time- varying interaction by engineering a fixed interaction Hamiltonian, as in [H, @, 0i but let us 
disregard this for the moment. 




FIG. 1: We can transfer a quantum state from one end of the chain to the other end, simply by alternately applying the 
Hamiltonians H\ and H2- 
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FIG. 2: Method for creating a two-qubit gate between distant qubits. The sequence up to the swaps places the information 
of the qubit Y onto the CU (| eie2) = a | 10} — | 01} where | Y) = a | 0} + /3 | 1}), and subsequently this is used to act upon 
the target qubit, X. After the controlled-C/ step, all previous steps must be undone to disentangle the CU. Part (a) translates 
the CU into a unique local pattern in the A qubits. Part (b) flips B qubits if they are surrounded by two | l)s. This is the 
mechanism that can be used to switch on and off sets of CUs. 



B. Quantum Computation 

We shall now examine, following [l| , how to perform a Quantum Computation on a one-dimensional chain of qubits 
with two switchable fields. We discuss this as it is the minimal case, so more complex systems should always be able 
to use this system, generally with refinements available which can significantly decrease the overheads involved in 
this simple scenario [8[ . The concept was originally introduced by Lloyd [9| , and, although his constructions were not 
minimal, have formed the basis of all subsequent schemes. 

Let us take a simple chain of qubits and allow pair-wise interactions between them. These should be grouped like 
the Hamiltonians Hi and H2, but allow general two-qubit gates instead of a simple SWAP gate (denoted by (3 and 
a respectively). Further, we shall make a distinction between the odd- numbered qubits (denoted A) and the even- 
numbered ones (B), as shown in Fig. [TJ The desire for a general 2-qubit interaction may seem like we're demanding a 
lot of the system, but there are simple ways to rephrase this requirement. For example, we can consider the ability to 
perform arbitrary single-qubit rotations on all the A qubits or on all the B qubits. Coupling this with a single 2-qubit 
interaction, such as a controlled-phase, is enough to create any arbitrary two-qubit interaction [T(|. 

The first thing to note is that by alternating applications of H± = f] SWAF and H2 = a SWAP , we can move the states 
on the A qubits independently of those on the B qubits (this is just state transfer again), assuming a chain of infinite 
length. The way that we plan to implement the computation is to place computational qubits only on the As. We 
somehow initialise one of the B qubits in the 1 1) state and take every other qubit to be, initially, in the | 0) state. At 
this stage it might appear that we are avoiding single-qubit control by creating a single-qubit state at the start of the 
computation. However, this is easier than single-qubit gates in general because there are other properties that we can 
take advantage of, such as edge effects at the end of the chain. How we perform the initialisation will be determined 
by the physical system [Til fl2l . ITU , Il3 | , but for the moment we can assume that we have control over a single qubit 
at the end of the chain. Our state transfer protocol can therefore be used to move this unique state, known as the 
Control Unit (cu), such that it is adjacent to any A qubit that we desire. At that point, we can perform a gate 
j3 c ~ u , i.e. a controlled-t/ operation, which means that the operation U is performed on the qubit to the right of the 
CU. Everywhere else, the control qubit is in the | 0) state, and so nothing happens. As a result, we have performed a 
one-qubit gate on a specific qubit using global pulses. 

All we now have to do is show how to perform a two-qubit gate, such as a controlled-NOT. The basic way that 
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we plan to do this is show how to entangle the CU and the control qubit. The state transfer protocol can then be 
used to move the CU to the target qubit, where it performs a standard one-qubit gate, before moving the CU back to 
the control qubit, and reversing the entangling steps to return the CU to its original state (Fig. [2]). This is the most 
involved part of a global control scheme, and requires some modification of the structure already outlined. We now 
choose to shift the computational qubits such that they are spaced onto every third A qubit, the rest being left in 
the | 0) state 2 . Having moved the CU to the right of the control qubit, the entangling sequence proceeds as follows 
(reading from left to right): 

a Y - c (3 c - x a c - H l3 c - z a c - H 

The second step of this process copies the state of the CU onto the (empty) A qubit to its right. This is allowed 
because it's in a classical state. If the CU is absent, the fi c ~ z step does nothing, and the two a c ~ H terms cancel 
each other. However, if the CU is present, a Z is introduced. Since HZH = X, it is possible to see that the CU gets 
deactivated conditionally on the state of the qubit to its left (the control qubit), thus giving the result we require. 

(a\0 A )+p\l A ))\0 B )\0 A ) -> (a\0 A )+p\l A ))\0 B )\0 A ) 
(a|(U)+/3|:U))|lB}|CU) -> (a\O A )\O B )+0\l A )\l B ))\l A ) 

There are a number of useful points to notice about this protocol. Firstly, if the qubit that we use as the control is 
a classical bit (i.e. a definite | 0) or 1 1)), this acts as a flag that indicates whether or not to deactivate the CU. Care 
does need to be taken, however. For example, if we continued with this CU and tried to perform a controlled gate 
with it, then, during the entangling step, the location where the original entangling steps takes place creates a new 
CU as well. Such additional interactions are easily compensated for, provided we remember that they happen. We'll 
come back to this process of enabling/disabling CUs, as it is very useful in error correction procedures. Secondly, if we 
were to measure, simultaneously, all of the B qubits immediately after the entangling operation, given that we know 
that every B qubit other than the CU is in the | 0) state, this acts as a measurement on the CU and, therefore, acts 
as a measurement on the single qubit that was acting as the control qubit. The CU can still be recovered afterwards 
so that we can continue with the computation. While not described in the original proposal, [H, this measurement 
method is supported by the architecture presented. 

This means that we now have a universal set of operations in this globally controlled structure, only requiring 
initialisation of the CU. 



C. Minimality of Encoding 

The scheme that we have presented here encodes a single computational qubit in every six physical spins. Slight 
modification of this idea allows the reduction to ten spins for two computational qubits. It is, naturally, an interesting 
question as to whether this is the minimal encoding. We can give simple arguments that indicate that this is indeed 
the case. Let us assume that we wish to keep the idea of a control unit. It is clear that this will have to be able to 
move relative to the computational qubits. As a result, we must already introduce a doubling of the spins (to divide 
them into As and Bs). 

In addition, we must consider the two-qubit gate. There are two possible concepts as to how this could be imple- 
mented. Firstly, we could consider the already outlined mechanism of coherently disabling the CU. Since this must 
be a reversible process, it must leave the information about the original CU somewhere so that it can be recovered. 
Given that we are using qubits 3 , this information must be stored on an additional system, and hence could either be 
placed on another A qubit, requiring a trebling of the device size (an extra A to store the state of the CU and an extra 
A to give a break between the stored CU and the next computational qubit), or by placing it on another B, without 
increasing the device size. We cannot achieve this for a similar reason to that which is outlined below for the futility 
of the second mechanism for a two-qubit gate. 

The second possibility for implementing a two-qubit gate is to use the CU to 'pull' a computational qubit along with 
it, as it moves through the device towards the target qubit. To achieve this, the CU must be capable of implementing a 
set of operations that swaps its nearest-neighbours. However, in order to achieve this (without performing the SWAP 



2 There is some small reduction in terms of the cost of qubits that can be made here, but it has been neglected for the sake of clarity. 
This reduction is present in the device structure of Fig. [3] 

3 I n |l2t HH we reduced the overhead by using higher dimensional systems 
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FIG. 3: The apparently minimal device size for three qubits. However, if we want to perform a one-qubit gate on the third 
qubit, we get a two-qubit gate between the other two qubits. 

anywhere else), the two qubits to be SWAPped must interact. This interaction can only occur through the CU because 
we only have two-body interactions, and hence cannot be controlled by the CU. Therefore, this is impossible. 

We are therefore left with the scheme outlined so far. A slight improvement can be made by realising that we only 
have to perform a single two-qubit gate at any time. As a result, we only need to store an inactive CU in one place. 
Therefore, pairs of qubits can share the region of spins where the CU can be stored. This reduces the requirements 
from 6 physical spins per qubit to 10 spins for every 2 qubits. 

D. Device Size 

In order to move to a global control scenario, we have had to switch from a situation where every single qubit would 
have been a computational qubit, to one where every 6 th qubit is computational, with an additional knock-on cost 
to the number of steps required in the protocol. When we want to consider a physical device, enumeration of these 
costs may be important. In particular, if we know how many qubits we can reasonably build in a system, how many 
computational qubits can we get out? One hidden cost of the global control scheme is that of edge effects. The state 
transfer protocol, which smoothly moves the row of A qubits through the row of B qubits assumes an infinitely long 
chain. If we have a finite size of chain, what happens at the ends is that some states that were stored on A qubits 
start piling up on qubits labelled by B, and it is possible that pairs of computational qubits would be adjacent to each 
other during gate processes, causing additional, unwanted interactions. To avoid this, it is necessary to ensure that 
there is sufficient 'padding' (i.e. a large number of qubits in the | 0) state at either end of the device) for the A states 
to move into. This means that to implement N computational qubits, the device needs to contain 127V physical spins. 

II. ERROR CORRECTION 

If the operations that we perform are perfect, then our scheme is complete - we can implement a universal set of 
gates in an efficient manner. In reality we will not be able to implement these operations perfectly and therefore we 
require error correction (2j. At first glance, this is a huge obstacle, for two main reasons. 

Firstly, we have set up our system so that we can implement only one operation at a time. Aharonov and Ben-Or 
[IH have proved that this is insufficient to be able to implement error correction, and that a degree of parallelism of 
at least 0(log(N)) is required for computation on N qubits. We can consider introducing multiple CUs into the device 
to satisfy this condition, as depicted in Fig. [5] In that case, however, we have lost the ability to implement individual 
gates within the device, such as during those periods between phases of error correction. To address this problem we 
will require a method for switching between the two phases where we have different arrangements of CUs. 

The second problem with error correcting a global control scheme is that traditional descriptions of error correction 
involve making measurements to determine the locations of errors. These measurement results will be different for 
each encoded qubit, and hence the required correction will be different as well. As a result, even though we need to 
run multiple CUs in parallel, it appears that they have to do different things! This problem can be circumvented by 
making the correction procedure coherent. 

We need to ensure that we error correct all the qubits in our system. This means not only the computational 
qubits, but also the CUs and the 'buffer' qubits - all those in a classical state that do not play an active role in the 
computation. To start with, we shall consider that all the classical states are stable, and just describe error correction 
for the computational qubits. 
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FIG. 4: When in error-correcting mode, a globally controlled architecture must apply the same operations on all blocks of 
qubits simultaneously. Taken from [2|. 



n) -tu- 
io) -no- 
lo) 



-Hh 



-tu- 



-Hr- 



-hh — I — I — H^- 
+^ 

-«) — o — no~ 
-m- 
+^ 

-m- 
6 j) ji no— ^- 

FIG. 5: Error correction of the Steane [[7,1,3]] code using a coherent feedback process instead of measurement and correction. 
This circuit must be repeated twice to correct for both X and Z errors. 



A. Coherent Correction Procedures 



When error correction is performed, the scheme typically follows the process of performing syndrome extraction so 
that some ancilla qubits contain information on the errors. After that, the ancillas are measured and we act depending 
on the results in some fixed, logical way. Hence, we might as well include this logic in a quantum circuit that feeds 
the error information directly back to the encoded qubit. This way, the correction procedure is the same for every 
encoded qubit, independent of what errors have occurred. The information on what errors occurred is left in the 
ancilla qubits, which then need to be reset. This idea is illustrated in Fig. [5] 



B. Ancilla Reset 



In order to avoid needing to measure and correct, we need to be able to reset qubits that are in an unknown state 
to the | 0) state. To achieve this, we assume that our qubits are not just qubits but have a third level, which we 
can populate from either of the states | 0) and 1 1). However, we will also assume that this third level, which is at 
some higher energy than the computational states, has a dissipative decay to one particular state, say | 0). This is 
essentially the same idea as algorithmic cooling [16| . The reset procedure is denoted in our circuits by £ . 



C. Switchable Parallelism 



In order to perform computation on our device, we require the ability to switch between two different scenarios. 
Firstly, we need a CU for every encoded qubit, so that we can perform error correction on it. Secondly, we need a 
single CU in the computer so that we can perform the algorithmic part of our computation. Thankfully, we have 
already seen how we can switch a CU off - we just use the first steps of a two-qubit gate, where the control qubit is 
a classical state. This classical state indicates whether the CU should be left switched on or not. We can therefore 
consider a slight modification of our device structure. We have many repeating blocks of L qubits, each encoding a 
single logical qubit. Adjacent to each block is another qubit, which is initialised in a classical state. All of these are 
set to 1 1), except for one, which is set to | 0). This additional qubit in the block of L is referred to as the Switching 
Station (ss). 
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FIG. 6: In order to switch between our algorithm on a quantum computer and error correcting mode, we must switch on and 
off a regular array of CUs, except for one which remains constant. This is achieved by performing the start of a two-qubit gate 
on a classical state stored at the edge of each encoded qubit. (1) The system starts in a state with one CU per switching station, 
performing operations in parallel on each section of the computer. (2) To switch to an algorithmic step, the CUs are moved to 
the switching stations, and all but one are deactivated. (3) The single CU performs an algorithmic step on a set of qubits. (4) 
The single CU returns to its original switching station, and all CUs are reactivated for another cycle of error correction. Taken 
from @]. 



Our computer starts off in error-correcting mode, i.e. with one CU for every logical qubit. When we want to perform 
a computational step, we move the CUs to the SSs (given the regular spacing of all the CUs and SSs, this happens 
simultaneously for all of them) and perform the first step of a two-qubit gate. This deactivates all the CUs in a 
reversible manner in every ss, except for the one that is set to | 0), leaving a single CU for performing the algorithmic 
steps that we require. 

Ideally, between two phases of error correction we would be able to perform an arbitrary computational step. 
However, given the one-dimensional organisation of our computer, we have to accept that for increasing device size, 
this is impossible. Instead, when we have to perform gates between distant qubits, we substitute this for a series of 
SWAP gates to gradually move the qubits closer together, so that they can eventually be interacted. One subtlety, 
however, is that our single CU appears in a single position, and so we need an increasing number of steps just to move 
it into the correct position. In the same way that we can move a computational qubit along the array, we can move 
the CU's starting position along the array. We achieve this by understanding how the deactivated CUs are stored in 
a pattern of | 101) on physical qubits. Our single-qubit gate protocol allows us to create this pattern in the region 
where our CU remains switched on, and remove it from the next SS along. 



D. Parallelism for Fault Tolerance 

Benjamin originally introduced the idea of SSs in [13], where he referred to them as sub-computers. Each sub- 
computer had a unique label, and the CUs would perform a computation on each label to decide whether they should 
be deactivated, allowing arbitrary patterns of CUs to be created. The concept of SSs is much more limited - we only 
need two different patterns of CUs. As a result, these SSs only require a fixed proportion of the device size, whereas 
the labels for the sub-computers grew with log(iV). While this is still a relatively modest cost, the computation time 
to create arbitrary patterns of CUs must require O(N) steps. 

The next logical question is whether there are fixed patterns of CUs that we can usefully create, while still requiring 
only a fixed proportion of the device size, and only requiring a fixed amount of the computation time. In particular, 
we would like to consider the configurations for fault-tolerance (FT). In this scenario, we desire the ability to switch 
from a single CU to sets of CUs that are active every L n encoded qubits, where n is an integer from (the original 
EC pattern) up to some maximum level, p — 1. This is because an encoded qubit consists of L computational qubits. 
Hence, when we concatenate a single level of code, the encoded qubits that we now produce consist of L of the 
originally encoded qubits. This continues up the levels of concatenation, yielding the indicated power law. The 
number of levels of concatenation, p, is independent of the number of qubits that we want to perform computation 
on, it only depends of the accuracy to which we desire our final computation. Hence, within our SS we could envisage 
p qubits, each indicating whether a particular CU should be turned on at each level of concatenation. This would 
require a minimum of computation at each step (one gate) , and still only requires a fixed proportion of the device 
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FIG. 7: To generate more complex sets of CUs, we perform computation on a switching station (2), which has been classically 
preprogrammed with sufficient information. The output is placed on a result qubit, which is used to conditionally deactivate 
sets of CUs. At the end of the process, the same result bit is used to reactivate the CUs, and the initial computation is inverted 
(4), or the result bit is reset. 



size. 

This expanded SS is actually more powerful than we require, and could be useful for some more advanced ideas, such 
as super-CUs. The rationale behind a super-CU is that when we operate on concatenated codes, most of the operations 
that we perform can be done so bitwise. Instead of using a single CU and repeating the same operation many times, it 
would be more sensible to have a super-CU, which consists of a dense block of CUs, allowing us to perform these bitwise 
steps with a single command. However, for the most simple form of fault-tolerance, this expanded SS is more complex 
than we require. Instead of p qubits, we can reduce it to \\og 2 {p + 1)] qubits, and a small additional computation. 
The point here is that we can organise it so that when we require a CU every L n qubits instead of every L™ _1 qubits, 
we only have to switch off CUs (there is no need to reactivate any). As a result, we just need the label of the SS to say 
at which level the corresponding CU gets switched off. Hence, we only have to encode the numbers to p — 1, which 
can be achieved in |~log 2 (p)] qubits. Of course, we also have to store the single extra number (p + 1) that marks the 
location of the single CU which we retain for performing the algorithmic steps of the computation. 

We must still show how the actual process of activation/deactivation of CUs occurs. To achieve this, we will start 
with a CU active in every SS , and perform a small (reversible) computation on the label of the SS. This computation 
will output a single bit onto an ancilla qubit, which can be used for the activation/deactivation procedure as before. 
We give the required computation in Tab. |TJ The concept of the circuit is quite simple to understand. We wish to 
determine if b > a. Starting with the most significant bit of each, if b\ > a±, then b > a and there is no need to 
continue Similarly, if b\ < oi, there is no need to continue. We then move to the next most significant bit. We only 
compare bit x if all the more significant bits of a and b are equal. This continues until termination of the sequence, 
either because the answer is determined, or because we have run out of bits (in which case we must also know the 
answer, but the terminating step is consequently slightly different). 

We therefore see that it is possible to generate sufficient parallelism for fault-tolerance. In subsequent sections, we 
will ensure that we have taken into account all necessary considerations by deriving a fault-tolerant threshold for our 
scheme. However, before we do this, we must consider all the other qubits in the system, not just the computational 
qubits. 



III. ERROR CORRECTION OF THE CONTROL UNITS 



We have now presented a scheme that appears to be able to perform arbitrarily accurate computation on the 
one-dimensional globally controlled array, even in the presence of (small) errors. However, this scheme has implicitly 
assumed the stability of all the classical states in the system - all the padding | 0) states, the labels in the SSs and 



4 If we are just switching from one level of concatenation to another, it will generally more efficient to by-pass this step, and just use the 
already active CUs. 
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TABLE I: Quantum algorithm, using ancillas d to determine if a > fo. is the a;" 1 most significant bit of the number a, which 
is stored in the SS. 6 represents the level of concatenation for which we wish to activate the CUs. 

the CUs. If any of these become corrupted, the whole computation can become corrupted. As a first step towards 
protecting these, we can simply state that our regular device structure guarantees the locations of these classical 
states in a periodic way. Hence, potentially, we could create global pulses that just address these. For example, the 
B qubits should always be classical states (unless we're in the middle of performing a two-qubit gate) . By applying 
regular measurement to these classical states, we get a Zeno effect which prohibits transition of a classical state 
into its complement. In the long-run, this is not sufficient for arbitrarily accurate computation, but may delay the 
requirement for more aggressive schemes of error correction. 

Eventually, we require the ability to perform error correction on the CUs. The first step in doing this is to acknowl- 
edge that these are classical states and, hence, we only have to protect them against bit-flips, and not phase-flips. As 
such, we can use a classical repetition code to protect the information. We still intend to perform our computation 
with single CUs, but at the error correcting stage we will switch on different sets of triples of CUs. These triples will 
compare themselves to each other and attempt correction. This step is somewhat non-trivial because we must allow 
the potentially faulty CUs to control their own actions, and yet we still need to be sure that the correction will result. 
One way in which this can be achieved was first presented in 0]. 

So, instead of using a single CU for a computation, we shall now use three of them. We do not intend to perform 
any part of the algorithm with all three CUs present, as this will be less efficient than switching off two of them. We 
can initially align these with computational qubits. The patterning needs to be chosen with care in order to minimise 
the complexity of the protocol. We target an operation on a single qubit by applying a controlled-phase gate (CP) 
with each of the three CUs, separated by the single qubit rotations Ui, Ui, U3 and U4 applied to all the A qubits. The 
resulting evolution on the targeted qubit is 

u 1 zu 2 zu 3 zu i . 



Any qubits that are far enough away from the CUs will not be affected by the CPs, and hence will be subject to the 
evolution U1U2U3U4, which we select to be the identity transformation i.e. U4 = U^U^U^. We need to create sequences 
such that we can apply an arbitrary rotation to the qubit we want, but do nothing to any of the other qubits. This 
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necessarily includes qubits that get affected by one or two of the CPs. The first step is to select a patterning of the 
CUs such that no qubit ever experiences 2 CPs (unless an error has occurred). The simplest example is to align the 
CUs with qubits qi, q 2 and q<± (note the gap, q 3 ). 

In the fault-tolerant scenario that we have been describing, the most efficient way of introducing these three CUs 
is by relabelling the switching stations using the following protocol. If the label was non-zero, add 1 to the value 
(these correspond to the qubits q\ for each block). Label the zero- valued SSs that are 1 and 3 SSs away from these 
relabelled SSs with the number 1 (corresponding to qubits q 2 and 54). So, by deactivating all CUs in regions controlled 
by SSs with labels, we get the regular patterning that is required, and we can still access the parallelism required 
for fault-tolerance. In fact, by repeating this procedure at every level of concatenation, this enables patterns of CUs 
for fault-tolerance at a cost of a single extra bit in each ss. 

The 1-2-4 arrangement of CUs can still cause unavoidable applications of single CP gates, 

UiZU\ 

UiU 2 zulu\ 

UiU 2 U 3 ZU^ul 

Clearly, if we apply the sequence twice, all of these will be removed, while the targeted qubit experiences the evolution 

U 1 ZU 2 ZU 3 ZUl V\ ZU2ZU3Z ul v\ u\ 

which, given that we have a free choice of Ux, U2 and U3, must contain sufficient freedom to create any single-qubit 
rotation that we desire (up to a global phase). 

Now that we are using a CU that has some redundancy in it, there is enough information to be able to correct 
for errors. There are two stages involved in making use of this information. Firstly, we must perform a syndrome 
extraction, placing information about any errors that have occurred on an ancilla qubit (which would otherwise have 
been a computational qubit). Secondly, we must feed back from this ancilla to be able to correct the faulty part of 
the CU. 

If one of the CUs suffers a bit flip, then only the targeted qubit will be affected. We can therefore neglect all other 
qubits, and just concentrate on the ancilla that we will be targeting, and which will initially be in the state | 0). For 
the three bits that can get flipped, the resulting evolution will be one of 

u 1 u 2 zu 3 zulzu 3 zululul, 
u 1 zu 2 u 3 z ul u\ zu 2 u 3 z ul ul u\ , 

or U 1 ZU 2 ZUlzU 2 ZUlul 

Using either U 2 = 1 or U3 — 1, the evolution when the CU has no error is 1. The results if there have been errors are 
shown in Table HT1 where 

v n = u x zu n zulzu n zulu{. 

n is either 2 or 3, where [/„ ^ 1. We thus have free choice of U\ and U n to make this remaining evolution X (for 
example, U\ = 1 and U n = e~ lX7T / 4 ), which enables the error syndrome to be placed on the ancilla. Similarly, we can 
create the evolution Z just by changing U\ to H . If we wanted to create the Hadamard gate, we would set 

1 ( V V2 + 1 - VV2-l \ 

How can this information be used to correct the right error? If we apply V3 = X, i.e. we have set U 2 = 1, then the 
target (ancilla) qubit is flipped if there is an error on cither of the first two CUs. We then use the circuit in Fig. [8] to 
feed back the error syndrome from the ancilla to the first CU, where part (a) is controlled by both CUs 2 and 3. If the 
error occurred on the second CU, then the feedback process won't occur. Therefore, we can correct for a single error 
on any of the CUs by repeating the process. 

Alternatively, we could target the same ancilla with pulse sequences such as (V 2 — H).(V 3 — Z).(V 2 = H), which 
would flip the ancilla only if an error has occurred on the second CU. In the previous method, we created the local 
pattern of 2 | l)s on neighbouring A qubits which is required to feed information back to the CUs by using the CUs. 
This is susceptible to error if a nearby B qubit that should be a | 0) gets flipped to a | 1) (the local pattern can be 
created in several places). However, this alternative technique allows us to make one of those | l)s a fixed classical 
state (within the SS, say), and then we need concentrate on only flipping the one ancilla next to it. In this case, if 
only a single error occurs somewhere, correction is localised in the right area of the device, and is stable against other 
B qubits being flipped. 
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Error occurs on CU: 


U 2 = 1 


U 3 = l 


none 


I 


I 


1 


V 3 


I 


2 


v 3 


v 2 


3 


I 


v 2 



TABLE II: The evolution that can occur if an error affects a single CU out of the three. By selecting the single-qubit rotations 
suitably, syndrome extraction can be performed. 



iwiyi7iyiT[Ti7i¥[7iyi^ 



FIG. 8: Method for feeding back error syndrome to CUs. The specification differs from the first half of Fig. [2] because the steps 
in part (a) are now controlled by the 2 CUs that are not being corrected. The first step has also been removed. Part (b) is 
applied to the CU that is being corrected only. 



A. Stability of the Classical States 



Now that we know how to keep the CUs stable, it is relatively easy to see how to keep the other classical states 
stable. We don't even need error correction, we just need to reset the qubits conditional on the presence of the now 
perfect CU. If the reset procedure is imperfect, then it just gets corrected in the next round of resets. This involves 
two steps, one which corrects those on A qubits and another that corrects those on B qubits. Both of these will occur 
in the regime where there is one CU activated for every ss. Resetting the A buffers is simple - we just move the CU 
next to each of them and send out the reset command, much as we would for a single-qubit gate. This requires a 
fixed number of steps, independent of device size. 

To correct the B buffers, we move the CU adjacent to the result qubit in the SS (the one which is used as the control 
bit), and switch it to the 1 1) state. We also do that to an adjacent A qubit (i.e. the nearest physical A qubit, not the 
next computational qubit). This creates the unique local patterning which is used to feed operations back onto the 
B qubits (see Fig. [2]). Hence we move all the buffer qubits between these two | l)s and reset them. One potential risk 
is an interaction with the CU when it is adjacent to one of the | l)s during the feedback process. Since the mechanism 
for the reset procedure will depend on the physical implementation, this question is difficult to answer, but we can at 
least say that if we were applying a unitary operation, there is no additional interaction. 

One might think that the unique patterning, the same as found in the two-qubit gate, could be used to deactivate 
the CU, meaning that all the B qubits should be in the | 0) state, so we could globally apply the reset procedure to 
all of them. Unfortunately, this process would also move the imperfections of the B qubits onto the A qubits, and, 
in turn, move some of the computational qubits onto B qubits (if there are errors on some of the A buffers). This 
reminds us that if there are imperfections in the B qubits, these will be mapped onto the A qubits, including the 
computational qubits. However, error correction will correct for these provided they occur sufficiently infrequently. 

Having reset the buffer qubits, we need to reset, or otherwise stabilise, the qubits in the SSs. We can use the different 
levels of concatenation in the fault-tolerant scenario to our advantage for performing resets of different organisations 
of qubits. Consider the scenario where we have a CU enabled for every L SSs. Between their own SSs, there are 
L — 1 SSs which all store the value and have deactivated CUs. Thus, we can reset all of these. Note, however, that 
they can't reset their own states. At higher levels of concatenation, there is a CU for every L 1 SSs. Regularly spaced 
between each of these are L — l SSs which contain the number i—1 and deactivated CUs that have not been corrected 
yet. So, we can move the active CUs along and reset all of these. Eventually, at the top level of concatenation, there 
are some switching stations that have never been reset. We cannot correct them with the single CU (errors would 
build up too quickly as the device size scales), so this would appear to be a problem. However, it is not. We have 
gone to a level of concatenation that is as good as we need i.e. it is sufficiently good that it is safe to assume that 
the top level of concatenation is stable. Hence, it is acceptable for these CUs to correct their own SSs - the effect 
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that we are concerned with is precisely the same effect as the termination of the hierarchy of concatenation instead 
of continuing it ad infinitum. 

IV. ERROR CORRECTION OF QUANTUM CELLULAR AUTOMATA 

As we have presented it here, our physical model corresponds in a natural way to a particular realisation of quantum 
computation optical lattices. In particular, we have assumed we can perform operations on the A qubits, controlled 
by the B qubits to their right (for example). Another global control scenario which, again, has a physical counterpart, 
is that our qubits can only determine the total spin of their neighbours, ±1, [U, UMIlSl- Our global pulses then take 
the form of Aq , which causes all the A qubits to flip their value if the total spin of their neighbours is (i.e. one 
B qubit is spin up, and the other is spin down). This is precisely the model of quantum cellular automata (see 
and references therein). All of the above work on error correction, cus, SSs etc. can be developed in this scenario. 
However, there is one vital difference, in that all schemes that have so far been discovered require an encoding of the 
computational qubits across several physical qubits. This introduces an additional complication, which was carefully 
circumvented by our choice of physical model - the computational basis itself must be stabilised. In particular, the 
logical qubits exist inside a subspacc of these physical qubits, and if a faulty pulse sequence causes a departure from 
this subspace, an error correcting code can do nothing about this. Some measures can be taken, such as stabilisation 
of the basis via the Zeno effect. This can potentially work to a similar level as error correction, in that we may be 
able to correct for a single error occurring between two Zeno pulses. However, it is still possible that two errors could 
occur, and ruin the computation. Hence, these ideas are not sufficient for a fully fault-tolerant scenario. Even after 
the work of the next section, therefore, the possibility of fault-tolerance in a quantum cellular automata remains an 
interesting open question. It may be solvable using the techniques of [2l|, [23. l23l . [HI , where it is shown that it is 
possible to keep encoded qubits within a particular subspace, and especially |25l] where this is done under a form of 
global control. We have not, as yet, explored this possibility. 

We can even justify that, using the concept of a CU, we must use an encoded basis for the CA model. This argument 
is simply a symmetry argument. Let us assume that we can encode the qubits and the CU in single spins. Since all 
the A qubits will behave the same, and the CU must do something different to the computational qubits, we can 
envisage placing the computational qubits on As, and the CU on a B. We then need to move the CU relative to the 
computational qubits. 

ABABABABAB 
0000010000 

If we were to send a pulse B^,, then clearly we cannot do anything on just the CU. If we send a pulse A v x , then we 
can perform operations on the neighbours of the CU by setting x = 0. However, by symmetry, both neighbouring A 
qubits perform the same operations i.e. if we can find a sequence of pulses to propagate the CU to the left, it also 
moves to the right. We can either choose to use this symmetry in our system, which requires us to double its size, 
so the CU is encoded on 2 spins, or we can break the symmetry with respect to A and B by choosing an encoding 
of the computational basis over an even number of spins. Raussendorf and others have showed another way to use 
the symmetry of the system without requiring a CU in the system (effectively by using edge effects to replace the Cu) 

Mm- 

V. DERIVING FAULT-TOLERANCE AND CONSTRUCTING THE CIRCUITS 

Having presented a globally controlled architecture that supports fault-tolerance, we would now like to calculate the 
error rate below which the error improves with each round of concatenation, and hence below which we can make our 
computation arbitrarily accurate. The approach that we take is to follow the proofs of (28|, which provide a rigorous 
derivation of a threshold. The authors then proceed to evaluate this threshold under the assumptions of arbitrary 
parallelism, non-nearest neighbour interactions and the ability to perform measurements. Under the assumption of 
global control, we have to remove all these simplifications, so we expect the threshold to be significantly worse. As 
with initial threshold estimates, however, the important point is not how large or small such an error rate is, but 
that a critical error rate does exist. The ability to derive a threshold in this model is also very useful because it 
is closely related to quantum cellular automata (20| . and whether fault-tolerant computation can be achieved with 
them. However, as previously described, there are still outstanding issues related to the fault-tolerance of a quantum 
cellular automaton. 

The calculation of a fault-tolerant threshold proceeds approximately as follows. First of all, we select a universal 
(generally over-complete) set of gates that we want to work with. Then we construct our error correcting circuit out 
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FIG. 9: Schematic construction for controlled-NOT gate on Steane [[7,1,3]] code, with inputs and outputs surrounded by error 
correcting circuits. 



of these gates, ensuring that (in this case) a single error occurring within the circuit propagates to no more than 
one output qubit. We then take each gate in our universal set (constructed to act on an encoded qubit), and apply 
the error correction circuit to all the logical input and output qubits. This idea is shown schematically in Fig. [5] for 
the controlled-NOT gate acting on the Steane [[7,1,3]] code, where the controlled-NOT is implemented by applying it 
bitwise on the physical qubits. 

The construction of our circuits has to take into account all the mechanisms that we require, such as the CUs, 
the buffer qubits, nearest-neighbour interactions etc. One of the primary simplifying assumptions that we will make 
is that the CUs and buffer qubits are stable. We can do this because we will be able to construct a fault-tolerant 
scheme for these, with their own threshold e c . We expect this threshold will be much larger than the threshold for 
the computation that we wish to implement (primarily because all the states are classical, so we just have to use a 
repetition code, which is vastly simpler). Since this threshold is much larger, we can assume that the time required 
to implement it is a negligible fraction of the computation time, and hence we can just neglect it. 

In order to take into account the physical structure of our device, we shall formulate all the gates in terms of 
nearest-neighbour interactions, with the additional caveat that gate n + 1 must be applied to a qubit that either is, 
or is adjacent to, the qubit that gate n was applied to. In this way, we don't have to worry about the CU running 
up and down the whole time, it just moves to its neighbour, costing a maximum of 3 SWAP operations. As such, the 
SWAP gates that we will add to the circuits are only present to correctly evaluate the number of 'wait' operations - we 
do not actually implement the swap operations. As such, the propagation of errors through these gates is irrelevant. 
In fact, this is very important because otherwise these SWAP gates could cause the errors to propagate to every qubit 
involved, which is certainly undesirable. 

We also have to remember that we can only apply one gate at a time in each error correcting block, and that we 
have to apply the 'do nothing' operation on the rest of the qubits. This adds significantly to the number of steps in 
the scheme. 

Now that we have written out the circuits under these constraints, we count up the number of locations in which 
an error can occur in each of the circuits. We then take just the gate (plus the error corrections) that has the largest 
location count (= A), as this will be the one the determines the threshold. Typically, this gate will be the one with 
the most inputs and outputs. If errors occur with probability e independently at each location, then the probability 
of n errors occurring within the circuit is 




The motivation for the construction demonstrated in Fig. [5] now becomes clear - it was proven in 28] that all single 
errors get corrected by these circuits. The probability of an error on the logical qubits is thus the probability that 
the error does not get corrected, 
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The approximation is valid because we expect (as will be confirmed in Sec. IVIIip that e is a small quantity. The 
computation can be performed to arbitrary accuracy provided e^ 1 -* < e, so equality gives an error correcting threshold, 
(q. The threshold can then be improved by enumerating the number of benign locations. These are the pairs of 
locations at which errors can occur and the logical qubits are still corrected. The first term in the expansion is 
therefore reduce from e 2 (^) to e 2 B. 




We intend to calculate the threshold for one particular error correcting code, the Steane 7-qubit code. Primarily, 
this is because the majority of gates can be performed on encoded qubits in a bit-wise manner, making them very 
simple. Secondly, this will give us a useful point of comparison with [28j |. which also uses this code. 

The aim of the remainder of this section is to describe general tactics that allow us to take a specific gate which 
acts on an encoded qubit and show how it can be performed, ensuring that if only a single error occurs in the circuit, 
only a single physical qubit from each encoded qubit is affected on the output (of course, we don't mind what happens 
to ancillas, provided these errors do not propagate to the encoded qubit). 

Naturally, some of our gate constructions are automatically fault-tolerant. In particular, any bitwise gates are 
fault-tolerant because there are no operations that can transfer an error from one physical qubit to another in the 
same encoded qubit. This highlights the reason for choosing the Steane code - a universal set of gates exists where 
only a single gate cannot be applied in a bitwise manner. Note that the gate N, which we introduce in the next 
section allows other gates to be applied in a bitwise manner. 



A. Propagation of Errors 



In order to ensure that our gate constructions are fault-tolerant, it is important to understand how errors propagate 
between qubits. The following identities may be useful: 



X 



-e- 



X 



-e- 



X 



H 



Z 



H 



H 



(2) 



(3) 



H 



H 



H 



X 



-e- 



These essentially state that bit-flip errors propagate from control to target, whereas phase-flip errors propagate in the 
opposite direction. Hence, if we're going to use some ancillas in the gate constructions, we will only have to ensure 
they're correct with respect to one type of error, a more realistic task than protecting them against all errors. 



B. Cat States and Majority Voting 



The significant problem for fault-tolerant circuits arises when we have to output multiple operations onto a single 
ancilla, which is then used to feedback information onto the original qubits. See, for example, Fig. [5J In this example, 
syndrome extraction is achieved by performing four controlled-NOT gates, each controlled by the same ancilla (initially 
in the | 0) + | 1) state), targeting the four different qubits from which we are extracting the syndrome. If an error 
occurs on this ancilla after the first controlled-NOT, for example, then it can feed through the other gates to affect 
the other three qubits. 
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This problem is circumvented by replacing the single ancilla with four ancillas, prepared in a 'cat' state, | 0000) + 
1 1111) (named for Schrodinger's cat). Each of the four controlled-NOTs is then controlled by a different ancilla. From 
there, we can take a vote between the four ancillas, onto a fifth, as to what the correct value is, and this can then be 
used as the control for the feedback operation. There are two different types of vote that we use, depending on the 
situation. We refer to these as weak and strong majority voting. 

1. Weak Majority Voting 

In the example discussed so far, we wanted to calculate the single bit which was the syndrome of a set of operators. 
We divided this into four separate operators, and the result that we required was the binary sum of these operators, 
since we only have to take into account the possibility that there is a single error anywhere. In fact, this is most easily 
achieved using a slightly modified version of the cat state, 

-^iJ® 4 (| 0000) + 1 1111))- (4) 
V 2 

This state is an equal superposition of all 4-bit strings with an even weight (i.e. an even number of Is). If the four 
target qubits are in the correct z-state, then this state is invariant. However, if one of them is faulty, one of the ancilla 
qubits is flipped and we have an equal superposition of all 4-bit strings with odd weight. Therefore, this difference is 
very easy to detect, each ancilla performs a controlled-NOT onto a fifth ancilla. If a single error occurs at this stage, 
then it is only this fifth ancilla which matters. 

We now just have to be sure that the state in Eqn. (jU can be created fault-tolerantly from | 0000). Note that phase 
errors in the final state do not matter to us. We start by creating a cat state by applying H to a single qubit, and 
then performing controlled-NOTs. If a bit-flip has occurred on the control qubit, then this propagates to all the target 
qubits. However, we will then apply a Hadamard to all the qubits, and hence convert these into phase-flip errors. 
These do not propagate to the computational qubits, only affecting the single ancilla bit at the end. If a phase-flip 
has occurred on one of the targets before the controlled-NOT, then the ancilla was in the | 0) state, and hence phase 
errors are irrelevant. Hence, if a single error occurs, it results in bit-flips on no more than one qubit in the state of 
Eqn. |gj). 

This operation is particularly apparent in the circuit for error correction, Fig. 1151 

2. Strong Majority Voting 

The other situation that we are interested in, although not explicitly given in Fig.QljJ we refer to as strong majority 
voting. In this situation, we wish to detect the error on a particular qubit several times. For example, in Fig. [5l a 
particular computational qubit is targeted by three controlled-NOTs performing the syndrome extraction. If it were 
to suffer an error between these three, the error could propagate to a second qubit due to an incorrect determination 
of the syndrome. In this case, we want to provide a constant value for the syndrome extraction steps, stored on an 
ancilla. If the ancilla gets flipped at some stage, then the encoded qubit would be falsely corrected, but it would only 
introduce a single error, not two. 

This proceeds by forming a cat state on several (say 4) ancillas, | 0000) + 1 1111). Controlled-NOT gates are then 
applied, each controlled by a different ancilla and targeting the same qubit. These qubits can then be used to vote, 
onto a fourth ancilla, as to whether the qubit had suffered an error. If the state is more than a single flip away from 
the cat state, then it was caused by a fault on the computational qubit. An alternative methodology is depicted in 
Fig. [TO] 

VI. FAULT-TOLERANT GATES 

In order to calculate a threshold, we have to specify what set of gates we are going use. All the circuits which we 
construct must be in terms of these primitives. The list specified here is certainly not minimal, but the more gates 
we have, the simpler the circuits that we can construct. 

1. Hadamard Gate, H 

2. Bit-Flip, X 



3. Phase-Flip, Z, and root, \[Z = S 
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FIG. 10: Possible implementation of the strong majority vote from the initial qubit (q) to the ancilla (a) 



4. controlled-NOT 

5. SWAP 

6. tt/8 gate, T 

7. Toffoli gate (controlled-controlled-NOT) 

With this set of gates, we have to construct circuits on the next level of encoded qubits to implement the entire 
set, and a circuit to implement error correction of the encoded qubits (this circuit is referred to as EC), such that if a 
single error occurs in any of the circuits, this affects no more than one of the qubits on each logical qubit. 

The construction of some of these gates requires some additional sub-circuits including, in particular, 

1. Preparation of cat states on physical qubits, | 00 ... 0) + 1 11 ... 1). 

2. Conversion of logical qubit to a classical repetition code, denoted N. e.g. 

a\O L )+0\l L ) -»a|00...0)+/3|ll...l) 

3. Preparation of | 0l) 

4. State preparation, particularly | 0^) + e l7r / 4 | 1l), using an input state | 0l) [29| . 

5. The error correction circuit, EC. 



A. Bitwise Gates 



The gates H, X, Z 7 S, c-NOT and SWAP are all applied bitwise, and are, therefore, comparatively simple. Their 
value of A is almost entirely determined by the product of the number of applications of EC and the number of 
locations in an EC. There are slight differences if we assume we have a super-CU such that all 7 gates can be applied 
simultaneously, or whether we have to apply them one at a time (in which case there are a number of 'wait' operations, 
which contribute to the number of locations) . However these are counted, the values of A will not be comparable to 
some of the other gates. 

Let us demonstrate the required counting with an example of an encoded cnot gate. Consider Fig. [5] ignoring, 
for now, the blocks of error correction. No two cnot gates act on the same qubits, so in principle all seven of 
them can be performed simultaneously. Each cnot gate counts as a single location, so this gate has 7 locations. 
Since this gate does not require any measurements, the result is the same when we remove the ability to perform 
measurements. If we remove the parallelism restriction, then only one gate can be applied at a time. Therefore, there 
are 13 locations for each of 7 time steps (12 locations where the 'wait' operation happens, and a single location for 
the cnot that we are applying). This gives 91 locations. With a restriction to gates between nearest- neighbours, 
we must introduce a series of SWAP gates to interlace the qubits from the two different logical qubits, as depicted 
in Fig. [TTJ The SWAP gates are not correctly ordered for the sake of space. However, it is clear that they can be 
implemented such that the CU would only have to jump to its nearest-neighbour to be able to implement the next 
gate. This requires a total of 42 SWAP gates, which are implemented one at a time. Hence, the total location count 
is 13 x (42 + 7) = 637. Finally, we move to the physical model, where we need to count the number of physical 
operations required to generate each of the interactions. In particular, to move the CU from one qubit to its neighbour 
requires 3 SWAP operations. Hence the total number of time steps increases by a factor of 4. The new location count 
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FIG. 11: Implementation of the encoded controlled-NOT gate, where gates are required to be between nearest-neighbours. 
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FIG. 12: Fault-tolerant construction of the 7r/8 gate, T. 



is 13 x (42 + 7) + 14 x 3 x (42 + 7) = 2695. However, this is actually somewhat misleading for a threshold argument 
because these additional swap gates only affect the lowest level of concatenation. All higher levels just act on logical 
qubits, which do not have these separations. We can take this into account when calculating the threshold. Let 
be the threshold for the first level of concatenation and higher, while Cq 1 is the threshold at the physical level. We 
can therefore write that 

which should improve our threshold estimate, e^ 1 . 



B. The tt/8 Gate 

When constructing a universal gate set, one only needs either the Toffoli gate or the tt/8 gate. The tt/8 gate 
typically gives a lower threshold value due to the fact that it acts on fewer qubits. However, the Toffoli, as described 
below, is very useful to us because we perform so many of these gates during the coherent feedback part of error 
correction. 

In order to apply the tt/8 gate, we first make an ancilla state, | 0l) + e 47r / 4 1 1^), as detailed in [2^]. This is input to 
the circuit specified in Fig. 1121 where the only additional component that we require is the gate N, which is specified 
below. 



C. The Toffoli Gate 



The Toffoli gate, or controlled-controlled-NOT, is the worst-case gate, simply because of the number of inputs 
and outputs. The typical fault-tolerant construction was first suggested by Shor [3(j, and has been altered to work 
deterministically in the absence of measurements [HI, [29| . We might also consider an alternative construction, using 
Fig. [13] At first glance, this circuit would seem to be much worse because it includes 6 applications of T, and hence 
6 applications of N, compared to only 2 applications in (29| . However, we may be able to gain some advantage by 
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FIG. 13: Toffoli gate, constructed out of other primitives at the same level of concatenation. 
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FIG. 14: Alternative formulation of the Toffoli gate. The auxiliary state | AND) = | 000} + | 010} + | 100} + | 111} must also be 
fault-tolerantly constructed. 



surrounding each gate by EC circuits. In this case, all single errors are corrected within each gate. Hence the only- 
pairs of locations which are not benign are contained within blocks of EC-gate-EC. Therefore, this could, potentially, 
represent a significant reduction in the number of non-benign pairs of locations, even if the number of locations has 
increased. Note that due to our available parallelism, we will also surround all the 'wait' operations with ECs as well. 

We must remember that we are limited to nearest-neighbour interactions, which means that we must keep track of 
the locations of the ancilla qubits, and count the SWAP operations required to move past these as well. It may also be 
useful to note that it does not matter which qubit the target qubit is, the number of required operations is the same 
(a swap operation moves from the start of the circuit to the end). 

It turns out that this operation is superior for a threshold under the most restrictive set of assumptions, which 
relates to our model of global control. Given that this represents an improvement, it is also relevant to ask whether it 
is more efficient to not include the Toffoli in our set of gates, and just expand it each time it is used in the EC circuit. 
This would allow us to use a gate with fewer inputs and outputs as the worst-case gate, which could, potentially 
counter-act the increase in size of the EC circuit. 



D. N 

This is by far the most complicated gate, and follows the construction of [2{|. As specified in that paper, the action 
of the gate is to take two inputs. One is a logical qubit, a | 0l) + (3\1l), and the other is a set of 7 qubits, all in the 
physical | 0) state. The output is then of the form 

a\0 L ) 10000000) + /3\ l L ) 1 1111111). 

However, we have chosen to represent this as a one-qubit gate to clearly indicate the fact that, in constructing this 
state, a single error can have a catastrophic effect on the logical states, and so they can no longer be used in the 
computation (only the physical qubits can). 

The output of this gate is not a logical qubit, as with other gates. It is, instead, a classical repetition code. Given 
that we will never perform error correction on this circuit, this is perfectly allowable. Any gates that are controlled 
off this repetition code can be performed bitwise. 

In essence, the gate is constructed by making use of the observation that the codewords of the Steane code have an 
even (odd) number of | l)s for the | 0l) (| 11)) state. Hence, performing a weak majority vote from the codeword onto 
an ancilla gives one of the seven bits required for the classical repetition code. Since we must repeat this sequence 
seven times, controlled by the same qubits in the codeword, a strong majority vote should first be used. This would 
perhaps be the way that would give the smallest threshold (see Sec. IVIIBI) . although we chose to directly follow the 
circuits given in [29(, which replace the strong majority voting step with a stabilization process (syndrome extraction). 
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FIG. 15: The EC circuit, designed such that a single error during the error correction process does not affect more than one of 
the output qubits. The circuit shown only corrects for Z errors. To correct for X errors, the circuit needs to be repeated. 



VII. FAULT-TOLERANT THRESHOLD FOR THE COMPUTATION 



A. Circuit for EC 



We have to be careful about the construction of a fault-tolerant error-correcting circuit. In particular, the error- 
correcting circuits previously shown, such as in Fig. [5] are not fault-tolerant, because if an error occurs on a qubit 
while the syndrome is being measured, the error remains on this qubit, and a different qubit is 'corrected'. Instead, 
we need to create a different circuit, that protects against errors of this form by using a degree of redundancy, and 
then majority voting, as described in Sec. IV Bl 

In terms of the primitives defined, the circuit in Fig. [TS] fulfills all the requirements of a circuit that corrects single 
errors on its inputs, and does not catastrophically propagate errors that occur during the circuit. Depending on how 
much parallelism is available (i.e. whether we can perform more than one operation simultaneously on a given block) , 
the arrangement of ancillas (and in particular, the number of ancillas used) can be optimised to reduce the number 
of locations in the circuit. 

Note that a single application of this circuit for error correction (while we require two applications for error 
correction) can also be used to fault-tolerantly prepare an encoded qubit in the | Ol) state, just by supplying | 0000000) 
in place of the logical qubit to be corrected. Given that we only need to prepare the | 0l) state at the beginning of any 
of our gates, we can incorporate this part in the application of the EC before the start of each gate, thereby reducing 
the required number of time steps. 

For simplicity, we have left one important part out of the circuit in Fig. 1151 which corrects a significant oversight 
in the circuit as shown. In particular, if a single error occurs on one of the computational qubits between sets of 
controllcd-NOT gates that feed the error information onto the qubits, then not only do we have a fault on that qubit, 
but we apply a correction to a different qubit. This is avoided at the start of the circuit by performing a strong 
majority vote process on each of the computational qubits which is to be read more than once. In the following 
subsection, we will learn that we are further justified in not depicting this part of the circuit because the extra gates 
are benign with respect to the rest of the circuit, and hence makes a negligible difference to the threshold estimate 
that we will make (however, we must remember that this process takes a certain number of steps, and adds to the 
number of error locations on the rest of the circuit) . 

With this circuit in place, we are now in a position to enumerate the number of locations for each type of gate. 
These results are given in Tab. IIIII We therefore take the Toffoli gate as being the one with the most locations (once 
we've placed error correcting units on each input and output). In this case A = 447357 + 6 x 19724 = 565701, and an 

approximate error threshold is given by (2) = 6.2 x 10~ 12 . 
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Gate 


Inputs 


No 


No 


Restricted 


NN 


Physical 




& Outputs 


restrictions 


measurements 


parallelism 


interactions 


Model 


H , S : Z, X 


1 


7 


7 


49 


49 


196 


CNOT 


2 


7 


7 


91 


637 


2695 


T 


1 


237 


2023 


4926 


10498 


34018 


Toffoli 


3 


N/A 


6279 


41952 


123663 


447357 


N 


1° 


N/A 


1127 


3078 


5990 


15692 


EC 


N/A 


142 


1486 


2928 


4982 


19724 



TABLE III: Number of locations for each gate type. As the columns go from left to right, we add more restrictions leading, 
finally, to the global control model. 



B. Benign Locations 

In [28| , counting the benign locations for pairs of errors provided a significant enhancement to the fault-tolerant 
threshold. However, the task was far simpler in that paper because the gate constructions used far fewer operations 
(due to the availability of measurement, arbitrary parallelism and non- nearest-neighbour operations), and because 
the gates were constructed out of Clifford operators, which means that the effect of errors can easily be calculated by 
propagating the errors through the circuit, which can be efficiently simulated on a classical computer. As such, our 
task is far harder, and we do not intend to count all the benign pairs. However, we can make (or re-use from [28]) 
some very simple arguments. 

1. If there is a 'wait' operation, then the locations either side of it form a pair. If there are multiple wait operations 
in a row, then all possible pairs of locations are benign. 

2. This can be generalised because the circuits are constructed in such a way that if an error affects one qubit of 
the 7, then for all time it only affects this one, and its equivalent qubits in the other logical qubits of the gate. 
Hence, if a second error occurs on this qubit, it acts as only a single error, and, therefore, all pairs along these 
lines are benign. 

3. All pairs of locations in an EC that acts on the input qubits are benign (28|. 

4. Pairs of errors that occur on two different output EC blocks are benign. This is clear because there are no gates 
that can cause the two errors to be present on the same logical qubit. 

5. A single error that occurs within a strong majority vote (except for those on the output ancilla) is benign with 
respect to any single error that occurs externally to that process. This is because the majority vote clears the 
effect of that error, just leaving the single error which gets corrected. By using a cat state of at least 6 qubits, 
it can also be made benign with respect to pairs of errors that occur within the circuit. 

C. Threshold 

We are now in a position to enumerate the number of operations required for each gate, and make a first estimate 
as to the number of benign pairs. We have chosen to split this into a series of steps, building up slowly to the 
final threshold. This allows us to see where the basic costs of a global control scheme come into effect. Firstly, we 
evaluate the threshold allowing arbitrary parallelism and measurements. For this, we get a value of 8.9 x 10~ 6 . This 
is comparable to the value of 2.7 x 10~ 5 , obtained by [28j, but indicates that our simple evaluation of the number of 
benign pairs actually misses a large fraction. This suggests that all further results would benefit from more rigorous 
accounting of the benign pairs. 

If we disallow measurements, then not only do our gate sequences get larger, but we also have to include the 
Toffoli gate in our set of gates. As a result, the threshold sees its most significant hit at this point, where we find 



4 This output is a classical repetition code of 7 qubits, not an encoded qubit. 
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eg = 6.4 x 10~ 9 . If we further assume that we have only a single CU in each error correcting block, then all gates have 
to be performed sequentially, instead of in parallel, thereby adding a lot of extra 'wait' operations. These further 
reduce the threshold to cq — 8.1 x 10~ 10 . Next, we must add in sets of swap operations so that the gates are always 
performed between nearest-neighbours and, further, even single-qubit gates must be performed on adjacent qubits. 
These leave us with a threshold of eo = 2.6 x 10 -10 . Recall that adding in these nearest-neighbour interactions would 
mean that the circuits are not fault-tolerant if we were to apply the swaps (although it is known how to make such a 
structure fault-tolerant [3l|). However, in the present case, these gates just serve to count the number of operations 
while the CU is moving independently. Hence, this calculation is valid here, but not for a general nearest-neighbour 
scheme. 

Finally, we must take into account the physical model where we actually have to perform 3 swap operations to 
move between adjacent computational qubits. As already discussed, however, this only has to be done at the lowest 
level of concatenation, which means that its contribution is not as significant as it might otherwise have been. The 
final threshold that we find is 

e = 6.8 x 10~ n 

This threshold simply provides a bound - the real threshold is certainly higher. However, the intention of this 
calculation was not to optimise this bound, simply to show that a bound exists for a global control scheme. In 
particular, theoretical calculations of thresholds, such as [HI, [281, g ive results several orders of magnitude worse 
than current numerical computations indicate. For example, while [281 ] calculates a threshold of 2.7 x 10 -5 , numerical 
estimates [32J put the threshold for similar assumptions to be closer to 0.03. We estimate that, had we used the strong 
majority voting version of the gate N, instead of the version of (29|, the threshold would have been approximately 
10 -10 , although making an accurate count of the number of benign locations is more demanding. 

VIII. FAULT-TOLERANT THRESHOLD FOR THE CLASSICAL STATES 

To finish the argument, we must now demonstrate fault-tolerant error correcting circuits for the classical states 
and calculate their threshold, e c . We expect such a threshold to be significantly larger because the encoding is much 
smaller (only 3 qubits as opposed to 7), and because our ECs only have to correct for one type of error - bit-flips. 
Phase-flips don't affect the final result. Although only 3 CUs may be active (per block), we have to remember that, 
actually, we are trying to stabilise all 7 of the CUs associated with that level of concatenation. 

As before, we must construct a list of the required operations, and build them out of the circuit primitives. In 
this case, the most costly operation will involve moving a cu's resting place from one SS to its neighbour. At all 
levels of concatenation except the lowest, the move operation involves four controlled-NOT gates (two in each SS) and 
seven SWAP gates (to move the CU from one SS to the other). However, in our previous calculation of the threshold, 
we did not decompose the two-qubit gate into the one-qubit gate steps, we merely took it as a primitive. The cu's 
manipulation of SSs is exactly the same as a two-qubit gate, and so we shall count each of these as a primitive. At 
the lowest level of concatenation, we must also move past all the buffer qubits, requiring 28 SWAP operations instead. 

Before and after this move operation, we must apply error correction to the CUs. This involves changing the array 
of CUs available (one step) , and then performing the procedures described in Sec. IIIII These procedures correct one 
of the three CUs, and hence must be repeated three times to ensure that errors do not propagate catastrophically. 
Subsequently, we use these 3 to reset all the CUs from the previous level of concatenation. This step is necessary 
because we are not operating, in the end, on encoded qubits, but with the single CUs and must therefore propagate 
the extra stability due to this round of error correction to all the CUs. As a result, we have a total of ~ 1800 locations 
at which errors can occur. This gives 1.6 x 10 6 pairs of locations at which faults could occur. Making no effort to 
enumerate the benign pairs, we simply quote e c = 6 x 10~ 7 , realising that this is insignificant compared to the value 
of 7 x 10~ n , above, as we expected. If we have a physical implementation in which we can match the main threshold, 
eo, and the classical states obey a threshold e c , then we can implement approximately 




operations between each phase of error correction on the CUs. Hence, the classical threshold will have an insignificant 
contribution to the overall threshold, justifying our exclusion of it from the full calculation. We have skipped over 
arguing that the operations we perform uphold the requirements of non-propagation of single errors. One could most 
simply justify that this can be done by moving to a 5-bit code instead of a 3-bit code. That way, even if a single 
error occurs during the error correction step, and discounting the particular bit being corrected, there are still 3 other 
unaffected bits forming a majority, and hence single errors can be prevented from propagating. 
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IX. SUMMARY AND CONCLUSIONS 

The main result of this paper is simply stated - that a globally controlled architecture has a fault-tolerant threshold 
which is a positive number. In achieving this result, we have made two basic assumptions. Firstly, we have treated 
the classical and quantum states as two distinct sections, and only errors on the computational qubits contribute to 
the final threshold. We have justified this by also calculating a threshold for the classical bits, which is much smaller. 
However, a more elegant approach would be to combine the two elements. The second, implicit, assumption is that 
our computer was correctly initialised with the required patterning of classical states. One might expect that our 
fault-tolerant protocols would enable us to correctly initialise these patterns from some smaller initial configuration 
but there is, as yet, no rigour behind these expectations. 

Given that little effort was expended in optimising the threshold, one might expect that significant improvements 
can be made to the calculated value. One is also given hope, since it was observed in [l3| that moving to a global control 
scheme allows alterations to (in this case) typical optical lattice schemes, such as changing from red-detuned lasers 
to blue-detuned, which brings significant (order of magnitude) benefits to some decoherence mechanisms, thereby 
compensating for some of the cost of moving to such a scheme. 

An open question that still remains is whether, for other global control schemes, the basis states (such as those 
required for use with cellular automata), can be stabilised. Our results only apply at the level of logical qubits, not 
the underlying physical model, except that the two coincide for our chosen model. 

This work was supported by Clare College, Cambridge and the European Commission through the Integrated 
Projects SCALA (CT-015714) and QAP (IST-3-015848). 
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